A Jacobian inequality for gradient maps on the 
sphere and its apphcation to directional statistics 
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Abstract 

In the field of optimal transport theory, an optimal map is known to be a 
gradient map of a potential function satisfying cost-convexity. In this paper, 
the Jacobian determinant of a gradient map is shown to be log-concave with 
respect to a convex combination of the potential functions when the underlying 
manifold is the sphere and the cost function is the distance squared. The 
proof uses the non-negative cross-curvature property of the sphere recently 
established by Kim and McCann, and Figalli and Rifford. As an application 
to statistics, a new family of probability densities on the sphere is defined in 
terms of cost-convex functions. The log-concave property of the likelihood 
function follows from the inequality. 



1 Introduction 

In recent years, the theory of optimal transport has been actively studied. In par- 
ticular, properties of the optimal transport map on Riemannian manifolds are well 
established. The existence and uniquene ss theorem for t he optimal transport map 

McCann 



on Riemannian ma nifolds was pro ved by 



this result extended the 



pioneering work of iBrenierl ( 1l99ll ) for the Euclidean case. He showed that opti- 
mal transport is given by the gradient map of a so-called cost-convex function. 
On the other hand, for statistical data analysis on Euclidean space, it is useful 



to consider convex combinations o: 



convex functions in order to construct various 



probability density functions ( ISeil (|2006[ ). ISeil ( 120091 )). In this paper, we show that 



when the underlying space is the sphere, the convex combination of cost-convex 
functions is actually cost-convex (Lemma [I]) and the Jacobian determinant of the 
resultant gradient map is log-concave with respect to the convex combination (The- 
or em [1]) . This result is an extensio n of the Jacobian interpolation inequality shown 
by ICordero-Erausquin et al.l (l200l[ ). We refer to our Jacobian inequality as the Ja- 
cobian inequality throughout this paper, for simplicity. 
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Our result is related to the regularity th eory of opt i mal t ransport maps. Here we 



consider some recent studies in this field. 



MaetaL 



(120051 ) showed that regularity 



of the transport map for general cost functions on Euclidean space is assured if 
a georn e trical quantity called the cost- sectional curvature is positive. Conversely, 
Loepeii ( I2OO5I ) showed that non-negativity of the cost-sectional curvature is neces- 
sary for regularity. He also showed that non-negativity of the cost-sectional cur- 
vature implies non-negativity of the usual sectional curvature if the cost function 
is the sq uared distan ce on a Riemannian manifold. However, the converse does 
not hold ( KimI ( 2007 )). Comprehensive assessment on the theory of optimal trans- 



( iKim and McCannI (120071 )) 
independently showed that the sphere 5" has almost positive cross-curvature. In 
general, the cost-sectional curvature is non-negative if the cross-curvature is non- 
negative. In the present paper, we use the non-negative cross- curvature property of 
the sphere to prove our main results. 

We show that our Jacobian inequality opens several doors for applications to 
directional statistics. In this field, a family of probability densities is used to analyze 
given directional data, such as locations on the earth. For example, a test on the 
directional character of given data is constructed via families of prob ability density 
functions on the sphere. Directional statistics has a long history s ince 



port has been publ i shed (IVillanil (120091)'). A re l evant con cept is the cross-curvature 



Kim and McCann 



and 



FigaUi and Riffordl toom 



Fishei ( 



1953) 



and a comprehensive text on this subject has been published (IMardia and Jupp 
( I2OO0I )). 

We define a probability density function on the sphere by the gradient maps of 
cost-convex functions. Although, in the context of optimal transport, one usually 
considers push-forward of probability densities, we construct a family of densities 
by means of pull-back of probability densities. This follows from the fact that 
a pull-back density has an explicit expression for the likelihood function needed 
for statistical analysis. The density function does not need any special functions 
such as the modified Bessel function, which usually appear in directional statistics. 
Furthermore, the Jacobian inequality implies that the likelihood function is log- 
concave with respect to the statistical parameters. This property is reasonable for 
computation of the maximum likelihood estimator. We propose more specific models 
and show graphical images of each probability density. In terms of analysis of real 
data, we present the result of density estimation for some astronomical data. 

This paper is organized as follows. In Section |2l we present basic notation and 
state our main theorem. In Section [3l we construct a family of probability density 
functions on the sphere and apply them to directional statistics. All mathematical 



2 



proofs of the main theorem and lemmas are given in Section |H Finally we present 
a discussion in Section |5l 



2 Main theorem 



Let 5'"' be the n-dimensional unit sphere. The tangent space at x G 5*" is denoted by 
Tj.S'". The geodesic distance (arc length) between x and y in 5" is denoted by d{x, y). 
The cost function is c{x,y) = {l/2)d{x,yY . If one uses Euchdean coordinates in 
W^^^ to express S'", then d{x,y) = cos~^{x^y), where the range of cos""*^ is [0,7r]. 
The c-transform (j)'^ of a function : S*" — )■ M is defined by 



(p^iy) = sup {-c(x,y) - 0(x)} 



(1) 



The function (p is said to be cost-convex, or c-convex, if {(p'^Y = (p. Examples of 
c-convex functions will be given in Section |3l By compactness of S*", a function 
is c-convex if and only if for any a; G S*" there exists some (not necessarily unique) 
y G 5" such that c{x, y) + (p{x) = mfz^s"{c{z, y) + (p{z)}. 

The image of the exponential map of f G T^S"^ at x G S", denoted by exp^(t>), is 
the end point of the geodesic starting at x with the initial vector v. More explicitly, if 
one uses Euclidean coordinates in W^^^ to express S*" and T^S"', the exponential map 
is written as exp^(t>) = (cos \v\)x + (sin |t>|)(t>/|f |), where \v\ denotes the Euclidean 
norm of the vector v. The exponential map exp^ is a diffeomorphism from {v G 
TxS'^ I |w| < tt} to 5" \ {x'}, where x' is the antipodal point of x. 

The following lemma is a consequence of the non-ne gati ve cross-curvature p r opert y 
of the sphere established by lKim and McCannI (120081 ) and lFigalli and Rifford 
See Section H] for a proof. 

Lemma 1 (Convex combination of c-convex functions). If (po and 0i are c-convex, 
then for each t G [0, 1] the function (1 — t)(po{x) + t(pi{x) of x is also c-convex. 



Remark 1. 



Figalli et al.l (l2009[ l showed Lemma [T] simultaneously and independently 



from us. Indeed, they showed more general result, in that the convexity of the space 
of c-convex functions is necessary and su fficient condi t ion fo r the non-negative cross- 
curvature property (see Theorem 3.2 in 



FigaUi et al.l (|2009|)) 



We define G(i,{x) = exp^.(V 0(x)) as long as (p is differe ntiable at x, where V is the 
gradient operator. Following iDelanoe and Loeperl ( l2006l ). we call : S"" — )■ S"" the 
gradient map associated with the potential function (p. The map is differentiable 
at X if |V0(a;)| < tt and (p has its Hessian at x. It is known that any c-convex (p on 
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any compact Riemannian manifold is Lipschitz and therefore differentiable almost 
everywhere. Furthermore, has a Hessian almost everywhere in the Alex a ndroy 



sens e, and therefore G^(x) is differe ntiable almost everywhere (see iMcCannI (1200 ll ) 



and 



Cordero-Erausquin et al.l (120011 )). These technical facts on differentiability are 
important for the theory of optimal transport. However, we will not need them 
because, for statistical applications, we can assume from the beginning that G^{x) 
is differentiable except at a finite set of points (see Section [3]). 

For any c-convex functions (po and 0i, by Lemma [Tj the convex combination 
4>t{x) = (1 — t)(f)Q{x) + t(f)i{x) is c-convex. We define an interpolation of gradient 
maps by 

Ft{x) = G^,{x) = exp,(V(/)t(x)), tG[0,l]. 

Assume that for each i G {0, 1}, | V0j(x)| < vr and 0i(x) has its Hessian at x. Then it 
is easy to see that, for any t G [0, 1], |V0t(x)| < vr and 4>t{x) has its Hessian defined 
at X. We define the Jacobian determinant Jt{x) = Jac(-Ft(x)) = det{dFt/dx) with 
respect to any orthonormal basis on T^S*" and Tpj(a;)S'" with suitable orientations. 
The following theorem is our main result. 

Theorem 1 (Jacobian inequality). Let (pQ and (f)i be two c-convex functions. Let 
X be a point in S"" such that, for each i = 0,1, |V0j(x)| < vr and (pi has its Hessian 
defined at x. Then the Jacobian determinant Jt{x) defined above is log-concave 
with respect to t. It is equivalent to the inequality 

logJt(x) > (l-t)logJo(x)+tlogJi(x), tG[0,l]. 

We refer to the above inequality as the Jacobian inequality in this paper. 



Rema rk 2. This theorem is an extension of the result obtained by lCordero-Erausquin et al 



(12001 



They showed a similar inequality under the additional assumption that 
00 = 0, as a corollary of a stronger inequality related to the geometric- arithmetic 
inequality. It is not known whether the stronger one holds for our case 0o ^ (see 
also Remark [5]). □ 

3 Application to directional statistics 

3.1 Probability densities induced by gradient maps 



In ISeil (l2006l ) and ISeil (l2009l ). the author proposed a family of probability density 
functions in terms of gradient maps on Euclidean space, where a probability density 
is constructed as a pull-back of some fixed measure (typically Gaussian) pulled by a 
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gradient map. The notion can be directly extended to probability density functions 
on the sphere. 

For statistical application, we will consider only c-convex functions (j) such that the 
gradient map is an isomorphism on and has its Hessian defined everywhere 
except at a finite set of points. We define some related terminology. 

Definition 1 (Wrapping potential function). We say that a function is a wrapping 
potential function if (f) is c-convex, has its Hessian defined everywhere except for 
a finite set of points and is an isomorphism on S^. Let W{S^) be the set of all 
wrapping potential functions. 

We have the following lemma. 

Lemma 2. If 0o ^-nd 0i are in W{S^), then the interpolation 0^ = (1 — t)0o + t0i 
{t G [0,1]) is also in W{S''). 

We construct a probability density function for each G W{S'^). Let be a 
random variable on distributed uniformly. Then, since x t— )■ G^{x) is bijective, 
we can define a random variable on S*" by X = G^^{U). The probability density 
function of X with respect to the uniform measure is P(/,(x) = Jac(G(^(x)), where 
the symbol Jac refers to the Jacobian determinant. In other words, we define p,f,{x) 
by the pull-hack measure of the uniform measure pulled by the gradient map G^. 

At this point, we describe the exact sampling method of the probability density 
function ^^(x). A sampling procedure is important if one needs to calculate expec- 
tations by the Monte Carlo method. From the definition, it is clear that the random 
variable X = G^^{U) with a uniformly random variable U on S*" has density P(t>{x). 
Hence if we can generate U and solve the equation G^{X) = U effectively, we obtain 
a random sample X. Indeed, U is quite easily generated, for example, by normal- 
ization of a standard Gaussian sample in IR"^^. To solve G^plX) = U, it is sufficient 
to find the unique minimizer of the function c{x, U) + 0(x) with respect to x since 
the following lemma holds. 



Lemma 3. [Lemma 7 of lMcCanru (120011 )] If is c-convex and u = G'(^(xo) is defined 



at Xq G S*", then the unique minimizer of c{x,u) + 0(x) with respect to x is Xq. 

Thus our task is to solve the (deterministic) minimization problem. Although the 
minimization problem of c(x, U) + (f){x) is not convex in the usual sense, the objective 
function has no local minimum, by c-convexity. Hence the problem is efficiently 
solved by generic optimization packages. An example of sampling is illustrated in 
Figure [TJ 
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3.2 Spherical gradient model 

We consider a finite-dimensional set of probability densities on the sphere. In statis- 
tics, a finite-dimensional set of probability densities is called a statistical model. An 
unknown parameter 6 that parameterizes the density functions is estimated from 
observed data points x{l), . . . ,x{N) G 5". One of the most important estima- 
tors is the maximum likelihood estimator that maximizes the likelihood function 
Y[tLiP{^{'t)\^) with respect to 6. 

We construct a new statistical model using c-convex functions. Recall that the 
set W{S"') of wrapping potential functions is a convex space (Lemma [2]). We can 
consider a finite-dimensional subspace as follows. Let G W{S"') ioi i = 1, . . . ,p. 
Define 

p 

where 6 = ranges over a convex subset G of M.^ such that (pg G W{S^) for 

any G 0. By Lemma [2] and the elementary fact that G W{S^), we can use 
the simplex {6 \ 6i > 0, ^^^^ 6*4 < 1} as O. Let p{x\6) be the probability density 
function induced by 4>g{x) G W^S"^), that is, 

p{x\e) = Jac(G^,(x)). (2) 

We call the family ([2]) the spherical gradient model. 

The maximum likelihood estimator for the spherical gradient model is reason- 
ably computed by the following corollary of Theorem [TJ 

Corollary 1. Define p{x\6) by (j2]). Then, for any data points . . . ,x{N) G S*", 
the likelihood function Y[k=iPi^W\^) log-concave with respect to 9. 

Remark 3. As an anonymous referee pointed out, in Eu clidean space, there ar e 



results on conv e xity a long generalized geodesies (Chapter 9 of lAmbrosio et al.l ( 120051 ): 



see also IVillanil ( l2009l )). Here a generahzed geodesic is defined by the set of measures 
pushed forward by the gradient maps {G^^}t(zio,i]- On the other hand, we consider 
pull-back measures in this paper as, for example, ([2]) indicates. 

3.3 Examples 

We give some examples of the spherical gradient model ([2]). Recall d{x,y) denotes 
the length between x and ?/ on 5". All the examples are combinations of rotation- 
ally symmetric functions f{d{x,z)), where z E S"' and / G C^([0,7r]). The k-th 
derivative of / is denoted by f^''\ The following lemma is fundamental. 
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Lemma 4. Assume that f^^\0) = /'■^-'(tt) = and f^'^\r) > —1 for almost all 
r G [0, vr]. Then for each z & S"' the function f{d{x, z)) of x is in W{S^). 

Let J-" be the set of functions on [0,7r] that satisfy the assumption in Lemma |H 
Choose p pairs {(/«, Zi)}^^-^ from J-" x S*". Then we can define the spherical gradient 
model ([2]) with 

= = G 0, (3) 

i=l 

where G is a convex subset of MP such that (pe G W{S"') for all G O. 

Remark 4. If p = 1, the resultant density p{x\6) is a function of d{x, z) for some z G 
S"". In directional statistics, such a probability density function is called rotationally 
symmetric. □ 

We briefly touch on known distributions on the sphere in statistics. A very well- 
known distribution on the sphere is the von Mises-Fisher distribution defined by 

V 2 y T{{n + l)/2)J(„+i)/2_i(|/i|) 

in Euclidean coordinates of where fi G M"+^ and 1^ denotes the modified 

Bessel function of the first kind and order u. A more general distribution is the 
Fisher- Bingham distribution defined by 

p{x\fi,A) = — — exp (^fjJ X + x~^ Ax) , (5) 



where a{fi, A) is a normalizing factor to ensure that J p{x\n, A)dx = 1. See lMardia and Jupp 
hood ) for details. 

We return to our spherical gradient model ([2]) with (I3j). The following explicit 
formula due to a general expression (fTTI) is useful for practical implementation: 

p{x\e) = {sm\vg\/\ve\T-'det(xx'^ + He + J2^^^^ 



i=l 



Zj X COS cy.j 



Ve = -y^Oif-{ai)ei, at 

1=1 ' 

He = eeCg H [I - xx - e^eg ] , ee = ve \ve\, ae = \ve\ 

sm ae 

fin N T , COS a» T T\ 

r^i = Ji H : [I -XX - ejCj ) , 

sm a,- ^ ' 
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where Euclidean coordinates in W^~^-^ are used. We remark that the above formula 
needs no special function, unlike the von Mises-Fisher distribution (jl]) or the Fisher- 
Bingham distribution ([5]). 

We give examples of pairs {fi,Zi). Recall that W{S^) is the set of all wrapping 
potential functions. 

Example 1 (Linear potential). Let fi{^) = cos(.^) for all i. We use Euclidean 
coordinates in M"+^ to express S". Then (pei^x) = Yl^i=i ^« cos{d{x, Zi)) = ^f^^ OiX~^Zi 
is in W{S"') as long as XliLi l^'l — ^- deduce that a potential function 4>^{x) := 
fi^x is in W^S"") if |yu| < 1. The parameter /i determines the direction and magnitude 
of concentration. That is, the resultant density function takes larger values at x 
when — is closer to x and is larger, where the negative sign of — /i/|/i| is 
needed because our model is defined by the pull-back measure. We call 0^ the linear 
potential and the resultant statistical model the linear-potential model. This model 
is rotationally-symmetric (see Remark Hj). An example is given in Figure [2] (a). 

Example 2 (Quadratic potential). Consider /j(0 = cos(^) for i = and 
fiiO = cos(2^)/4 ioT i = pi + 1, . . . ,p. Then the potential can be written as 

PI P n 

1=1 j=pi+l 

Let e and A G Sym(M'"+^). Let |v4|i denote the trace norm of A defined 

by the sum of absolute eigenvalues of A. This is actually a norm because |y4|i = 
max_7^B^/ tr[Ai?]. Then we deduce that a potential function 

4>fi,A{x) = x^ li + -x^ Ax (6) 

is in W{S'^) if (/i. A) satisfies + < 1. We call the model the quadratic-potential 
model. Various numerical examples of the quadratic potential model are given in 
Figure El Note that the representation of A includes redundancy because x'^ x = 1. 
It will be tractable if one sets tiA = 0. However, in general this restriction strictly 
reduces the size of the set. For example, the matrix A = diag(0.2, 0, —0.8) has norm 
\A\i = 1 but the trace-adjusted one B = diag(0.4, 0.2, —0.6) has \B\i = 1.2 > 1. 

Example 3 (High-frequency potential) . As a generalization of the above examples, 
we consider fi{^) = k^"^ cos(fci^) for a positive integer ki. If Z = {zi, . . . , Zp) E (5")^ 
and K = {ki, . . . , kp) G Z^q are given, we obtain a potential 

p 

(peix) = ^6ik:r^ cos{kid{x,Zi)). (7) 

i=l 
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We call this model the high-frequency model. Various numerical examples of the 
high-frequency model are given in Figure |3l The density function used in Figure [1] 
belongs to this class. 



3.4 An actual data set 

Here we give a brief analysis of some astronomical data. The data consist of the 
locations of 188 stars of magnitude brighter than or equal to 3.0. The data is 
available from the Bright Star Catalog (5th Revised Ed.) distributed from the 
Astronomical Data Center. We simply compare the quadratic model and the null 
model (uniform distribution) by using Akaike's Information Criterion (AIC). In 
general, AIC for a statistical model is defined by the sum of (—2) times the maximum 
log-likelihood and 2 times the parameter dimension. It is recomme nded to select 
the statistical model minimizing AIC from a set of candidates. See Akaike ( 19741 ) 
for details of AIC. 

The estimated parameter for the quadratic model is 

/t = (0.010,0.017,0.091)^ and A = 0. 173^1^1^ - 0.250z2^J, 

where zi = (0.731,0.048,-0.681)^ and Z2 = (0.544, 0.562, 0.623)^. The maximum 
log-likelihood is 12.5. Since the number of unknown parameters is 8, AIC is —9.0. 
On the other hand, the likelihood of the null model (uniform distribution) is zero and 
AIC is also zero. Therefore, we select the quadratic model from the two candidates. 
Figure S] shows the observed data and the estimated density. 

4 Proofs 

4.1 Proofs of Lemma [1] 



We use the following lemma due to Proposition 6 of iMcCannI ( 120011 ). The lemma 
can also proved by direct calculation for the sphere. Recall c{x,y) = d{x,yY/2. 

Lemma 5 (Inverse of the exponential map). Let x,y & S"" and assume d{x, y) < vr. 
Then Vxc{x,y) = —exp~^{y), where denotes the gradient operator with respect 
to X. 



We fi rst recall the cross-curvatu re non-negativity of the sphere (IKim and McCann 



( 120081 ). iFigalli and RiffordI (120091 )). For simplicity, the definitions below are special- 
ized for the sphere. For a given triplet {x,y,z) G (S"")^ with d{x,z) < vr and 
d{y,z) < vr, the curve 

{exp,{{l-t)exp;\x)+texp-\y))\te[0,l]} 
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is called a c-segment connecting x and y with respect to z. We denote the c- 
segment by [x,y]t{z) in this paper. For given x,y G S'^ with d{x,y) < vr, let as 
and Tt be smooth curves such that ctq = x and tq = y. We assume that either 
c"s = [c"05 or Tt = [tq, Ti]t{x). Note that only one of the two curves is assumed 

to be a c-segment. Then the cross- curvature S is well defined by 



ds'^ df^ 



s=0,t=0 



where ^ = das/ds\s=o and rj = dTt/dt\t=Q. For a given quadruplet {x, z,yo,yi) G 
(S*")^, t/ie sliding mountain is defined by a function 

t ^ c(2;, [?/o,l/i]t(^)) - c(x, [|/o,Z/i]t(2))- (8) 



We use the following fact proved by lKim and McCannI ( 120081 ) and lFigalli and Rifford 
( 120091 ). 

Lemma 6 (Cross-curvature non- negativity). For the sphere, the cross-curvature 
S{x,y){C,,ri) is non-negative for any {x,y,^,r]) with d{x,y) < n. 



Although the following lemma is essentially due to iKim and McCannI (120081 ) . we 
derive it from Lemma |6] for completeness. 

Lemma 7 (Time-convex-sliding-mountain). Let z be a point in 5" and let yo and 
yi be two points in 5" different from the antipodal point of z. Then for any a; G 5*" 
the sliding-mountain (jH]) is convex with respect to t G [0, 1]. 

Proof. Denote the sliding-mountain (]8]) by f{t). Fix t G (0,1) and denote y = 
[yQ,yi]t{z) for simplicity. We first assume y is not the antipodal point of x and 
prove d'^f{t)/dt^ > 0. Let as = [z,x]s{y) and r„ = [y,yi]u{z). Note that as is a 
c-segment with respect to tq = y. Then from Lemma [6l we have 

c/2 , 



> 



n=0 



ds"^ du^ 

for each s G [0, 1]. On the other hand, by Lemma [5l 
^c{as,Tu) = -(^,expJ^(T:J) 

s=0 

= -(^, (l-M)exp;^(y)+Mexp;^(?/i)), 

where ^ = {das/ds)\s=o and (■, ■) denotes the inner product on T^S"'. We obtain 

d d^ 



(9) 



ds du' 



0. 



(10) 



s=0,u=0 
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Integrating both sides of (Q with respect to s twice and using ffTOj) . we have 



^2 



rf2 



> 0. 



u=0 

■2 - 



Since r„ = = [?/o, yi]t+u(i-t)(^), we have (Pf{t)/dt^ > 0. Next we assume 

that y is the antipodal point of x. By assumption, y is not the antipodal point of 
z. By direct calculation, we have 



s^t+o ds s^t-o ds 
Therefore f{t) is convex over t G [0, 1]. 



2tt 



d[yo,yMz) 



dt 



> 0. 



□ 



Now we use Lemma [7] to prove Lemma [H For any point x G S*", we denote the 
antipodal point of x hj x'. Since 0o and 0i are c-convex, there exist functions 0^ 
and (f)l such that = sup^ {— c(x, — 0^ (y)} {i = 0, 1). Then we have 

= {1 - t)(f)oix) + t(f)i{x) 

= sup{-(l -t)c{x,yo) - (1 -t)0o(yo)} + sup{-te(x,yi) 
I/O yi 

= sup sup {-(1 - t)c{x, yo) - tc(x, yi) - (1 - t)0o(?/o) - ) 

where the last equality follows from continuity of c and (p^ {i = 0, 1). Now we consider 
a c-segment [yo, yi]t{z) and denote it by yt{z) for simplicity. From Lemma[71 we have 

- (1 - t)c{x,yo) - tc{x,yi) 

= sup {-c{x,yt{z)) + c{z,yt{z)) - {l-t)c{z,yo) -tc{z,yi)} , 

where the supremum of the right hand side is attained at 2; = x. Hence 



(1 -t)c{z,yo) -tc{z,yi) 



(f)t{x) = sup sup {-c{x,yt{z)) + c{z,yt{z)) 

yo,yi7^x' zj^yl^,y[ 

-(l-t)0S(l/o)-t0^(l/l)} 

= sup{-c(x,ti;) -^(w)}, 

w 

where ^ is defined by an infimum convolution 



^{w) := inf {-c{z,w) + {1 -t)c{z,yo) +tc{z,yi) 

{yo,yi,z)\yo,yi7^x' ,yt{z)=w 

+ (l-t)0S(yo)+t0^(Z/l)}. 

Since (pt is written in the form of a c-transform, it is c-convex. This proves Lemma [TJ 
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4.2 Proof of Theorem [T] 

For each c-convex function 0, let be the set of points x such that |V0(x)| < vr 
and (f) has its Hessian defined at x. If is a wrapping potential function (Defini- 
tion [1]), then S"- \ consists only of a finite set of points. 



The following lemma is essentially proved in iDelanoe and Loeperl ([20061) 



Lemma 8. If is c-convex, then |V0(x)| < vr except for at most one x G 
Furthermore, if |V0(a;)| > n for some x, then G^iy) = G^{x) for any y G fi(0). 

Proof. Let be c-convex. Assume that there exists x G ^^(0) such that |V0(x)| > vr. 
In general, any c-convex function on a compact Riemannian manifold is Lipschitz 
continuous wi th Lipsch i tz con stant less than or equal to the diameter of the manifold 
(Lemma 2 of McCann (j2001 )). Since the diameter of the sphere S"" is vr, we have 



|V0(x)| = vr. Hence Grf){x) is the antipodal point x' of x. We now prove that 
G(j){y) = x' for all y G ^2(0). We use 2-monotonicity of the gradient map: 

d^{x,G^{x)) + d^{y,G^{y)) < (f{x,G^{y)) + d^{y,G^{x)), 

where d{x, y) is the distance between x and y. The above inequality follows from 
Lemma [31 Let a = d{x,G^{y)), h = d{y,x') and c = d{y,G^{y)). Then we have 
n'^ + c^ < a^ + h"^ . By the triangle inequality with respect to the triangle (x, y, G^{y)), 
we have c > \a + b — tt\. Therefore 

> TT^ + c^ - - 6^ 

> tt"^ + (a + b - Tcf - a"^ - 
= 2(7r - a)(7r - 6). 

This implies a = vr or 6 = vr; equivalently, G^{y) = x' or y = x. Hence we have 
G(i,{y) = x' for any y G fi(0). Then |V0(?/)| < vr for any y x from the definition 
of G^. □ 

We proceed to the proof of Theorem [H Fix two c-convex functions 0o and 0i and 
let 0f = (1 — t)0o + t0i for t G [0, 1]. Let x G ^^(0o) H fi(0i). Then it is easy to 
see that x G fi(0i) for any t G [0, 1]. Recall that the gradient map of 0* is denoted 
by Ft{x) = exp^.(V0t(x)). Note that Ft{x) is a c-segment [Fo{x) , Fi{x)]t{x) . We 
prepare some notation to represent an explicit formula of the Jacobian determinant 
of Ft{x). Let at{x) be the Jacobian determinant of the exponential map at V0f(x), 
i.e. Cti^x) = det{(i(exp^.(t>))/(it>}|^=V0t{a;)? where the determinant is calculated with 
respect to any orthonormal bases. Denote the Hessian operator at x by Hess^; and 
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let Ht{x) = (B.esSxc{x,y))y=F^(^x)- Then, by Cordero-Erausquin et al. (2001), the 
Jacobian determinant of Ft{x) is 

Jt{x) = crt{x) det {Ht{x) + B.essr,(j)t) ■ (H) 

Lemma 9. Let x G ^{(po) H Q{(f)i). The matrix-valued function Ht{x) is concave 
with respect to t: 

Ht{x) ^ (1 - t)HQ{x) + tHi{x) for any t G [0, 1], 

where B means that yl — i? is non-negative definite. 

Proof. Since Ft{x) is a c-segment [Fo{x) , Fi{x)]t{x) , Lemma [7] implies that 

c{w,Ft{x)) - c{x,Ft{x)) 
> (1 - t){c{w, Fo{x)) - c(x, Fo(x))} + t{c{w, F,{x)) - c(x, F,{x))} 

for all w G S*". By taking the Hessian with respect to w at w = x, we obtain 

Ressy,c{w , Ft{x))\n]=x ^ (1 - t)Hess^c(w, Fo(x))|^=^ + tHess^c(w, Fi(x))|^=3;. 

This means Ht{x) ^ (1 - t)//o(a;) + tHi{x). □ 

Lemma 10 (Jacobian-ratio inequality). Let x G ^^(0o) nfi(0i). Then the following 
inequality holds: 

^)"">a_,(^)'\,(^)-'". a„ 

Proof. By the formula ffTTl) . it is sufficient to prove that det^^"(iff + Hess^^^j) is con- 
cave with respect to t. Indeed, by Lemma |9] and the geometric-arithmetic inequality 
on det^'''^, we obtain 



det^/"(iJt + Hess, 

> det^/" {(l -t)i/o + t^^i + Hess,0f} 

= det^/" {(1 - t){Ho + Hess,0o) + t{Hi + Hess,.0i)} 

> (1 - t)det^/"(ffo + Hess,0o) + Met^/"(/Ji + Hess^0i). 

Hence det^/"(//t + Hess^0 t) is concave. □ 

Remark 5. li (po = 0, the inequality ( !T2|) is similar to the Jacobian inequality, due 
to Cordero-Erausquin et al. (2001). They showed that if 0o = 0, 

Mx)'^" > (l-t)t;i-i(Fi(x),x)i/" + t[t;i(x,Fi(x))]i/"Ji(x)i/", (13) 
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where Vt{x,y) denotes the volume distortion coefficient (see Cordero-Erausquin et 
al. 2001 for details). The inequality ( !T3|) is crucial to prove a Brunn-Minkowskii-type 
inequality on manifolds. However, since the inequality (fT3l) is only established for the 
special case 0o = 0, it is not sufficient for our statistical application. Unfortunately, 
(fT3!) is not implied from (fT2|) . In fact, if (poi^) = 0, then Jo{x) = 1 and cro(x) = 1, 
and the inequality f|T2|) reduces to 



This inequality is weaker than ( !T3|) because vi-t{Fi{x) , x) > 1 > (Tt{x) and vt{x, Fi{x)) = 



Lemma 11. For any x G ^{(po) H ^{4>i)y ^ogat{x) is concave with respect to t. 

Proof. For the unit sphere S*", the Jacobian determinant of the exponential map 
is given by (sin |t>|/|t>|)"'~"'". Therefore at{x) = (sin | V0f(x)|/|V0t(x)|)"'~"'". Since 
the function [0,7r] 3 p ^ log(sinp/p) is decreasing and concave, and since the 
map t I— 7- |V0t(a;)| is convex with respect to t, we deduce that the composite map 
\ogat{x) = log(sin |V0t(x)|/| V0t(x)|) is concave. □ 

Now we prove Theorem[Tl By Lemma [TO] and Lemma [TT| the functions log{Jt{x) / at{x)) 
and log(T((x) are concave with respect to t. Hence log Jf(a:) is also concave. 

4.3 Proof of Lemma [2] 

Recall that 14^(5'") is the set of c-convex functions such that the gradient map 
is an isomorphism on S*" and (p has its Hessian defined everywhere except at a finite 
set of points. 

Lemma 12. Let be a c-convex function and differentiable. Then is injective 
if and only if c(x, G^{x)) + c{z, Gff,{z)) < c(x, G^{z)) + c{z, G^{x)) for any x ^ z. 

Proof. In general, by Lemma [31 2-monotonicity 



c{x,G^{x)) + c{z,G^{z)) < c{z,G^{x)) + c{x,G^{z)) 
holds for any x and z, where equality holds if and only if G'</,(x) = G^{z). The result 



Lemma 13. Let 0o and 0i be members of W{S"-'). Then, for any t G [0,1], the 
gradient map Ft{x) = exp^.(V0t(x)) is injective. 




at{x)/ai{x). 



□ 



follows immediately. 



□ 
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Proof. Put ht{x, z) = c{x, Ft{x))+c{z, Ft{z))—c{x, Ft{z))—c{z, Ft{x)). By Lemma [T2l 
it is sufficient to show that ht{x, 2;) < for any t G [0, 1] and x ^ z. By the assump- 
tion and Lemma [T2l we have ho{x, z) < and hi{x, z) < 0. On the other hand, by 
Lemma [TJ ht{x,z) < (1 — t)ho{x, z) + thi{x, z). Hence we obtain ht{x,z) < 0. □ 

Now we prove Lemma [2] Assume that 0o and 0i are members of W{S"'). From 
Lemma [T3l Ft is injective. On the other hand, Lemma |8] imphes that |V0o(3;)| < 
and |V0i(x)| < vr for all x G S'". Then Vcpoi^) = exp~-^(Fo(a;)) and V0i(a;) = 
exp~^(Fi(x)) are continuous. This implies Ft is continuous. Hence, by compactness 
and connectedness of S"", Ft must be an isomorphism on S"". Twice differentiability 
of (pt follows immediately from that of (po and (pi. 

4.4 Proof of Lemma |4] 

Fix z and let (p{x) = f{d{x,z)), for simplicity. We ffist prove c-convexity of (p. 
It is sufficient to show that for each xq G S*" there exists some y G S"" such that 
c(xo, y) + (p{xo) = mfx^s"{c{x, y) + (p{x)}. Thus we investigate the point minimizing 
c(x, y) + (p{x) for each fixed y. Denote the antipodal points of y and z by y' and z' , 
respectively. If x is different from y', then the gradient vector of c{x, y) + (p{x) with 
respect to x is 

Vx{c{x,y) + (p{x)] = Va:c{x,y) + Vxc{x,z)- — 

y/2c{x,Z) 

where denotes the gradient operator with respect to x. Note that the above 
expression makes sense for x = z and x = z' because f^^\0) = /'■^-'(tt) = and 
/ G C^([0,7r]). By LemmaO we know that Vxc{x,y) = —exp^^{y) and Vxc{x,z) = 
— exp~^(z). Hence the gradient vector Vx{c{x,y) + (p{x)} vanishes only if x lies on 
a great circle C that passes through y and z. Since the exceptional point y' is also 
included in C, we deduce that the point minimizing c{x, y) + (p{x) must belong to 
C. We fix a circular coordinate C, G (— vr, vr] representing a point on C such that 
y corresponds to ^ = 0. Let ^ and ( be the coordinates of x and z. We assume 
C G [0, vr] without loss of generality. Then the function c{x, y) + (p{x) can be written 
as 

hiO := c(x,2/) + 0(x) = ^ + /(min{|e-C|,|e-C + 27r|}). 

By the assumption for /, one can easily check that the second derivative of h is 
h^^\0 > (> long as ^ 7^ TT. Furthermore, we obtain h^^^n — 0) > 

h^^\-n + 0). Thus ^ = TT is not a point minimizing h. Furthermore, the point 
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minimizing h is unique because h is strictly convex over (— 7r,7r). We denote the 
minimizer by ^ (~^) ^] the corresponding point in S"" by xq. If y revolves 
along a great circle C passing through z, then xq must continuously revolve along 
C. Since ?/ can belong any great circle passing through z, we deduce that for each 
point Xq there exists some y E C such that the function c(x, y) + of x G S"" is 
minimized at Xq. This proves c-convexity of (p. 

Next we prove the gradient map G(f){x) is well defined and an isomorphism. Since 
(f) is differentiable everywhere, is well defined. Let x = exp^(te), where t G [0, vr] 
and e G T^S'" with |e| = 1. Then the gradient map is explicitly given by G(f,{x) = 
exp^ ((t + f^^\t))eY If if: moves from to vr, then t + f^^\t) moves from to vr 
monotonically because 1 + > for almost all t G [0, tt]. Hence G^j, : S"" ^ 

is an isomorphism. 

Lastly, is clearly twice differentiable whenever x ^ z and x ^ z'. This completes 
the proof. 



5 Discussion 



We briefly discuss the Jacobian inequality for general manifolds. 

In the proof of Theorem [H we have used the closed property of cost-convex func- 
tions (Lemma [1]), the Jacobian-ratio inequality (Lemma fTOll and log-concavity of the 
Jacobian of the exponential map (Lemma [TTj) . For ai iy non-negatively c r oss-cu rved 
(or time-convex-sliding-mountain) manifold defined in lKim and McCannI (120081 ). the 
former two lemmas are obtained in the same manner. However, Lemma [H] does not 
automatically follow from the non-negative cross- curvature condition. 

The author does not know if any Riemannian manifold with non-negative cross- 
curvature satisfies the Jacobian inequality. At least, any product space of 5" and 
satisfies the Jacobian inequality becaus e the non-negative cross- curvature condition 
is preserved for products of manifolds (IKim and McCanru ( l2008l )) and the Jacobian 
determinant of the exponential map is also factorized into the Jacobian determi- 
nant on each space. This fact may enable us to describe dependency structures of 
multivariate directional data in statistics. We leave such an extension for future 
research. 
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(a) A density function on the sphere. 



(b) Samples. 



Figure 1: Exact samphng. (a) a density function (a white region indicates high 
density) and (b) 2000 sampled data points. The c-convex function used is 0(a;) = 
0.5 cos(2(i(a;, ei)) + 0.5 cos(3(i(x, 62)) for x G 5*^, where ei and 62 denote unit vectors 
along the horizontal and vertical axes, respectively. See Subsection 13.31 for details. 
Only the northern hemisphere is drawn. The number of points on the northern 
hemisphere was 967 in this experiment. The program code was written in R and 
the computational time for sampling was about ten seconds. 
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(a) II = ei. 



(b) A = e,ej. 




(e) /X = 0.5ei, A = -O.Seie^. (f) A* = ei/3,A= (-626^ + e3eJ)/3. 

Figure 2: The quadratic-potential model. The white regions indicate high density. 
The figures represent (a) Concentration, (b) Negative dipole, (c) Positive dipole, (d) 
Complementary dipoles, (e) Unbalanced dipole and (f) General. 



20 



(a)Z = (ei),fc = (3),e = (l). (b)Z = 



(ei,e2),X=(3,3),^=(0.5,0.5). 




(c) Z = (ei, 62), = (9, 9), ^ = (0.5, 0.5). (d) Z = (d, (d + 62)/ V2), 

K = (30,4),^ = (0.5,0.5). 

Figure 3: High-frequency spherical gradient models. White regions indicate high 
density. 




(a) Northern hemisphere. (b) Southern hemisphere. 



Figure 4: The observed data points and the estimated density for the astronomic 
data. Both hemispheres are viewed from the northern side. White regions indicate 
high density. The points are the observed data. 
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